智能论文笔记

LegalRelectra: Mixed-domain Language Modeling for Long-range Legal Text Comprehension

Wenyue Hua , Yuchen Zhang , Zhe Chen , Josie Li , Melanie Weber

分类：自然语言处理

2022-12-16

The application of Natural Language Processing (NLP) to specialized domains, such as the law, has recently received a surge of interest. As many legal services rely on processing and analyzing large collections of documents, automating such tasks with NLP tools emerges as a key challenge. Many popular language models, such as BERT or RoBERTa, are general-purpose models, which have limitations on processing specialized legal terminology and syntax. In addition, legal documents may contain specialized vocabulary from other domains, such as medical terminology in personal injury text. Here, we propose LegalRelectra, a legal-domain language model that is trained on mixed-domain legal and medical corpora. We show that our model improves over general-domain and single-domain medical and legal language models when processing mixed-domain (personal injury) text. Our training architecture implements the Electra framework, but utilizes Reformer instead of BERT for its generator and discriminator. We show that this improves the model's performance on processing long passages and results in better long-range text comprehension.

translated by 谷歌翻译

Improving Training and Inference of Face Recognition Models via Random Temperature Scaling

Lei Shang , Mouxiao Huang , Wu Shi , Yuchen Liu , Yang Liu , Fei Wang , Baigui Sun , Xuansong Xie , Yu Qiao

分类：计算机视觉 | 人工智能

2022-12-02

Data uncertainty is commonly observed in the images for face recognition (FR). However, deep learning algorithms often make predictions with high confidence even for uncertain or irrelevant inputs. Intuitively, FR algorithms can benefit from both the estimation of uncertainty and the detection of out-of-distribution (OOD) samples. Taking a probabilistic view of the current classification model, the temperature scalar is exactly the scale of uncertainty noise implicitly added in the softmax function. Meanwhile, the uncertainty of images in a dataset should follow a prior distribution. Based on the observation, a unified framework for uncertainty modeling and FR, Random Temperature Scaling (RTS), is proposed to learn a reliable FR algorithm. The benefits of RTS are two-fold. (1) In the training phase, it can adjust the learning strength of clean and noisy samples for stability and accuracy. (2) In the test phase, it can provide a score of confidence to detect uncertain, low-quality and even OOD samples, without training on extra labels. Extensive experiments on FR benchmarks demonstrate that the magnitude of variance in RTS, which serves as an OOD detection metric, is closely related to the uncertainty of the input image. RTS can achieve top performance on both the FR and OOD detection tasks. Moreover, the model trained with RTS can perform robustly on datasets with noise. The proposed module is light-weight and only adds negligible computation cost to the model.

translated by 谷歌翻译

CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment

Yuchen Liu , Li-Chia Yang , Alex Pawlicki , Marko Stamenovic

分类：机器学习

2022-11-04

Speech quality assessment has been a critical component in many voice communication related applications such as telephony and online conferencing. Traditional intrusive speech quality assessment requires the clean reference of the degraded utterance to provide an accurate quality measurement. This requirement limits the usability of these methods in real-world scenarios. On the other hand, non-intrusive subjective measurement is the ``golden standard" in evaluating speech quality as human listeners can intrinsically evaluate the quality of any degraded speech with ease. In this paper, we propose a novel end-to-end model structure called Convolutional Context-Aware Transformer (CCAT) network to predict the mean opinion score (MOS) of human raters. We evaluate our model on three MOS-annotated datasets spanning multiple languages and distortion types and submit our results to the ConferencingSpeech 2022 Challenge. Our experiments show that CCAT provides promising MOS predictions compared to current state-of-art non-intrusive speech assessment models with average Pearson correlation coefficient (PCC) increasing from 0.530 to 0.697 and average RMSE decreasing from 0.768 to 0.570 compared to the baseline model on the challenge evaluation test set.

translated by 谷歌翻译

On Grounded Planning for Embodied Tasks with Language Models

Bill Yuchen Lin , Chengsong Huang , Qian Liu , Wenda Gu , Sam Sommerer , Xiang Ren

分类：人工智能 | 自然语言处理 | 机器学习 | 机器人

2022-08-29

语言模型（LMS）被证明具有对物理世界的常识知识，这对于在日常情况下完成任务至关重要。但是，LMS是否有能力为具体任务生成扎根的可执行计划，这仍然是一个悬而未决的问题。这是非常具有挑战性的，因为LMS没有“眼睛”或“手”来感知现实的环境。在这项工作中，我们展示了有关这个重要研究问题的第一个研究。我们首先提出了一个名为G-Planet的新型问题公式，它将其作为输入一个高级目标和在特定环境中的对象表。预期输出是一个计划，该计划包括逐步指令供代理执行。为了实现此问题的研究，我们建立了一个评估协议，并设计了一个专门的指标来评估计划的质量。在我们的广泛实验中，我们表明，为编码环境添加扁平表并使用迭代解码策略都可以提高LMS的基础计划能力。我们对结果的分析也导致有趣的非平凡发现。

translated by 谷歌翻译

3D-FM GAN: Towards 3D-Controllable Face Manipulation

Yuchen Liu , Zhixin Shu , Yijun Li , Zhe Lin , Richard Zhang , S. Y. Kung

分类：计算机视觉

2022-08-24

由于生成对抗网络（GAN）的突破，3D可控制的肖像合成已大大提高。但是，用精确的3D控制操纵现有的面部图像仍然具有挑战性。虽然连接gan倒置和3D感知，但噪声到图像是一种直接的解决方案，但它效率低下，可能导致编辑质量明显下降。为了填补这一空白，我们提出了3D-FM GAN，这是一个专门为3D可控制的面部操作设计的新型有条件GAN框架，并且在端到端学习阶段后不需要任何调整。通过小心地编码输入面图像和3D编辑的基于物理的渲染，我们的图像生成器提供了高质量，具有身份的3D控制面部操纵。为了有效地学习这种新颖的框架，我们制定了两种基本的训练策略和一种新颖的乘法共同调制体系结构，可在天真的方案上显着改善。通过广泛的评估，我们表明我们的方法在各种任务上的表现优于先前的艺术，具有更好的编辑性，更强的身份保存和更高的照片真实性。此外，我们在大型姿势编辑和室外图像上展示了设计更好的概括性。

translated by 谷歌翻译

Court Judgement Labeling Using Topic Modeling and Syntactic Parsing

Yuchen Liu

分类：机器学习

2022-08-03

在实践普通法的地区，相关的历史案件是量刑的重要参考。为了帮助法律从业人员发现以前的判断更容易，本文旨在通过某些标签标记每个法院的判断。这些标签对于总结判断并可以指导用户采取类似判断在法律上很重要。我们引入了一个启发式系统来解决该问题，该系统始于以方面驱动的主题建模，并使用依赖性解析和选区解析短语生成。我们还为香港构建了一个法律术语树，并实施了一个简化模块来支持该系统。最后，我们根据生成的标签提出了类似的文档建议算法。它使用户能够根据一些选定的方面而不是整个段落找到类似的文档。实验结果表明，该系统是该特定任务的最佳方法。就汇总文档而言，它比简单的术语提取方法更好，并且建议算法比全文比较方法更有效。我们认为该系统在法律和其他领域都具有巨大的潜力。

translated by 谷歌翻译

SdAE: Self-distillated Masked Autoencoder

Yabo Chen , Yuchen Liu , Dongsheng Jiang , Xiaopeng Zhang , Wenrui Dai , Hongkai Xiong , Qi Tian

分类：计算机视觉

2022-07-31

通过开发基于生成的自我监督学习（SSL）方法，例如Beit和Mae，如何通过掩盖输入图像的随机补丁并重建缺失信息来学习良好的表示形式。但是，Beit和Peco需要一个“预先陈述”阶段，以生成用于掩盖补丁代表的离散代码手册。 MAE不需要预训练的代码簿流程，但是将像素设置为重建目标可能会引入前训练和下游任务之间的优化差距，即良好的重建质量可能并不总是会导致模型的高描述能力。考虑到上述问题，在本文中，我们提出了一个简单的自鉴定的蒙面自动编码器网络，即SDAE。 SDAE由一个使用编码器解码器结构的学生分支组成，以重建缺失的信息，并制作一个师范分支，生产蒙版代币的潜在表示。我们还分析了如何从信息瓶颈的角度来为教师分支机构建立潜在代表性的好看法。之后，我们提出了一种多重掩蔽策略，以提供多个掩盖视图，并具有平衡的信息以提高性能，这也可以降低计算复杂性。我们的方法很好地概括了：只有300个时期预训练，香草vit-base模型在Imagenet-1K分类上达到了84.1％的微调精度，48.6 MIOU在ADE20K细分方面和48.9 coco检测中的MAP，它超过了其他方法，从而超过其他方法。通过相当大的边距。代码可从https://github.com/abrahamyabo/sdae获得。

translated by 谷歌翻译

Emotion Recognition based on Multi-Task Learning Framework in the ABAW4 Challenge

Tenggan Zhang , Chuanhe Liu , Xiaolong Liu , Yuchen Liu , Liyu Meng , Lei Sun , Wenqiang Jiang , Fengyuan Zhang

分类：计算机视觉

2022-07-19

本文介绍了我们对第四次情感行为分析（ABAW）竞争的多任务学习（MTL）挑战的提交。基于视觉功能表示，我们利用三种类型的时间编码器来捕获视频中的时间上下文信息，包括基于变压器的编码器，基于LSTM的编码器和基于GRU的编码器。使用时间上下文感知表示，我们采用多任务框架来预测图像的价，唤醒，表达和AU值。此外，将平滑处理用于完善初始价和唤醒预测，并使用模型集成策略来结合不同模型设置的多个结果。我们的系统在MTL挑战验证数据集上实现了$ 1.742 $的性能。

translated by 谷歌翻译

Fully Decentralized Model-based Policy Optimization for Networked Systems

Yali Du , Chengdong Ma , Yuchen Liu , Runji Lin , Hao Dong , Jun Wang , Yaodong Yang

分类：机器学习 | 人工智能 | (统计)机器学习

2022-07-13

增强学习算法需要大量样品；这通常会限制他们的现实应用程序在简单的任务上。在多代理任务中，这种挑战更为出色，因为操作的每个步骤都需要进行沟通，转移或资源。这项工作旨在通过基于模型的学习来提高多代理控制的数据效率。我们考虑了代理商合作并仅与邻居进行当地交流的网络系统，并提出了基于模型的政策优化框架（DMPO）。在我们的方法中，每个代理都会学习一个动态模型，以预测未来的状态并通过通信广播其预测，然后在模型推出下训练策略。为了减轻模型生成数据的偏见，我们限制了用于产生近视推出的模型使用量，从而减少了模型生成的复合误差。为了使策略更新的独立性有关，我们引入了扩展的价值函数，理论上证明了由此产生的策略梯度是与真实策略梯度的紧密近似。我们在几个智能运输系统的基准上评估了我们的算法，这些智能运输系统是连接的自动驾驶汽车控制任务（FLOW和CACC）和自适应交通信号控制（ATSC）。经验结果表明，我们的方法可以实现卓越的数据效率，并使用真实模型匹配无模型方法的性能。

translated by 谷歌翻译

TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform Mask

Yuchen Su , Zhiwen Shao , Yong Zhou , Fanrong Meng , Hancheng Zhu , Bing Liu , Rui Yao

分类：计算机视觉

2022-06-27

由于字体，大小，颜色和方向的各种文本变化，任意形状的场景文本检测是一项具有挑战性的任务。大多数现有基于回归的方法求助于回归文本区域的口罩或轮廓点以建模文本实例。但是，回归完整的口罩需要高训练的复杂性，并且轮廓点不足以捕获高度弯曲的文本的细节。为了解决上述限制，我们提出了一个名为TextDCT的新颖的轻巧锚文本检测框架，该框架采用离散的余弦变换（DCT）将文本掩码编码为紧凑型向量。此外，考虑到金字塔层中训练样本不平衡的数量，我们仅采用单层头来进行自上而下的预测。为了建模单层头部的多尺度文本，我们通过将缩水文本区域视为正样本，并通过融合来介绍一个新颖的积极抽样策略，并通过融合来设计特征意识模块（FAM），以实现空间意识和规模的意识丰富的上下文信息并关注更重要的功能。此外，我们提出了一种分割的非量最大抑制（S-NMS）方法，该方法可以过滤低质量的掩模回归。在四个具有挑战性的数据集上进行了广泛的实验，这表明我们的TextDCT在准确性和效率上都获得了竞争性能。具体而言，TextDCT分别以每秒17.2帧（FPS）和F-measure的F-MEASIE达到85.1，而CTW1500和Total-Text数据集的F-Measure 84.9分别为15.1 fps。

translated by 谷歌翻译